Now that you know how to create graphics and visualizations in R, you are armed with powerful tools for scientific computing and analysis. With this power also comes great responsibility. Effective visualizations is an incredibly important aspect of scientific research and communication. There have been several books (see references) written about these principles. In class today we will be going through several case-studies trying to develop some expertise into making effective visualizations.
The worksheet questions for today are embedded into the class notes.
You can download this Rmd file here
Note, there will be very little coding in-class today, but I’ve given you plenty of exercises in the form of a supplemental worksheet (linked at the bottom of this page) to practice with after class is over.
Fundamentals of Data Visualization by Claus Wilke.
Visualization Analysis and Design by Tamara Munzner.
STAT545.com - Effective Graphics by Jenny Bryan.
ggplot2 book by Hadley Wickam.
Callingbull.org by Carl T. Bergstrom and Jevin West.
Write some notes here about what “effective visualizations” means to you. Think of elements of good graphics and plots that you have seen - what makes them good or bad? Write 3-5 points.
Question: Evaluate the strength of the claim based on the data: “German workers are more motivated and work more hours than workers in other EU nations.”
Very strong, strong, weak, very week, do not know
Based on the plot, which only showed point-estimates, one may agree with the claim above. However, once 95% CIs are added, the difference may not be statistically different, and the claim above may not be true. Also, the difference between Germany and other countries above average is less than 1 hour. Another question is the countires listed in the graph is representative of EU?
X-axis should have started from 0 not from 36, which amplified the difference. Even if it started from 0, it is a very poor use of space. Normally, you don’t add actual numbers when graphs are on a grid.
In addition, longer hours do not mean they’re more motivated. There could be other reasons why they have longer working hours on average than other EU countries.
Question: For the years this temperature data is displayed, is there an appreciable increase in temperature?
Yes, No, Do not know
I do not know because I do not know what is considered as a significant change in temperature in meteology or whatever the field of importance. Also I’d like to see 95% confidence band to see.
Year was not labelled. Y-axis did not have to start at 0. Apparently 2 degree difference is a significant change.
Question: Evaluate the strength of the claim based on the data: “Soon after this legislation was passed, gun deaths sharply declined.”
Very strong, strong, weak, very week, do not know
The y-axis was reversed, which is counter-intuitive to most readers. Again, there was not enough information to determine if the claim is true or not.
Choice of colour makes difference in interpretation of the figure. The y-axis is not to be upside down.
Great resource for selecting the right plot: https://www.data-to-viz.com/ ; encourage you all to consult it when choosing to visualize data.
Left hand side (0-0.6%) vs. right hand side (86 to 95%); these 2 y-axes differ by 100 fold. In fact, the autisum prevalence has stayed constant over time.
Avoid
We will be filling these principles in together as a class
Instructions: Here is a code chunk that shows an effective visualization. First, copy this code chunk into a new cell. Then, modify it to purposely make this chart “bad” by breaking the principles of effective visualization above. Your final chart still needs to run/compile and it should still produce a plot.
How many of the principles did you manage to break?
Did you know that you can make interactive graphs and plots in R using the plotly library? We will show you a demo of what plotly is and why it’s useful, and then you can try converting a static ggplot graph into an interactive plotly graph.
This is a preview of what we’ll be doing in STAT 547 - making dynamic and interactive dashboards using R!
library(tidyverse)
## -- Attaching packages ------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gapminder)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
p<-ggplot(gapminder, aes(x=gdpPercap, y=lifeExp, color=continent))+geom_point()
p %>% ggplotly() #Interactive plot
#plot_ly syntax
gapminder %>% plot_ly(x= ~gdpPercap, y= ~lifeExp,
color= ~continent,
type='scatter',
mode='markers') #'line' for line
#it should be rendered as html unless you save an img file
Sys.setenv("plotly_username"="julieagnes") #double check function name
Sys.setenv("plotly_api_key"="fxpJFGku76V59wmye6Px")
api_create(p, filename='cm013-plotly-example')
## Found a grid already named: 'cm013-plotly-example Grid'. Since fileopt='overwrite', I'll try to update it
## Found a plot already named: 'cm013-plotly-example'. Since fileopt='overwrite', I'll try to update it
You are highly encouraged to the cm013 supplemental exercises worksheet. It is a great guide that will take you through Scales, Colours, and Themes in ggplot. There is also a short guided activity showing you how to make a ggplot interactive using plotly.